Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 2057992 |
| Missing cells | 2650580 |
| Missing cells (%) | 6.4% |
| Duplicate rows | 6707 |
| Duplicate rows (%) | 0.3% |
| Total size in memory | 314.0 MiB |
| Average record size in memory | 160.0 B |
Variable types
| Text | 5 |
|---|---|
| Numeric | 6 |
| Categorical | 5 |
| DateTime | 4 |
| Dataset has 6707 (0.3%) duplicate rows | Duplicates |
arrival_delay_check is highly overall correlated with departure_delay_check | High correlation |
arrival_delay_m is highly overall correlated with departure_delay_m | High correlation |
departure_delay_check is highly overall correlated with arrival_delay_check | High correlation |
departure_delay_m is highly overall correlated with arrival_delay_m | High correlation |
eva_nr is highly overall correlated with long and 2 other fields | High correlation |
info is highly overall correlated with lat and 3 other fields | High correlation |
lat is highly overall correlated with info and 2 other fields | High correlation |
long is highly overall correlated with eva_nr and 2 other fields | High correlation |
state is highly overall correlated with eva_nr and 4 other fields | High correlation |
zip is highly overall correlated with eva_nr and 3 other fields | High correlation |
arrival_delay_check is highly imbalanced (69.8%) | Imbalance |
departure_delay_check is highly imbalanced (69.7%) | Imbalance |
path has 211069 (10.3%) missing values | Missing |
arrival_plan has 211069 (10.3%) missing values | Missing |
arrival_change has 474922 (23.1%) missing values | Missing |
departure_change has 339378 (16.5%) missing values | Missing |
info has 1414133 (68.7%) missing values | Missing |
arrival_delay_m has 1404435 (68.2%) zeros | Zeros |
departure_delay_m has 1335764 (64.9%) zeros | Zeros |
Reproduction
| Analysis started | 2024-12-07 08:47:40.873467 |
|---|---|
| Analysis finished | 2024-12-07 08:49:11.636423 |
| Duration | 1 minute and 30.76 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
ID
Text
| Distinct | 2026585 |
|---|---|
| Distinct (%) | 98.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
Length
| Max length | 34 |
|---|---|
| Median length | 33 |
| Mean length | 32.847704 |
| Min length | 28 |
Unique
| Unique | 1995178 ? |
|---|---|
| Unique (%) | 96.9% |
Sample
| 1st row | 1573967790757085557-2407072312-14 |
|---|---|
| 2nd row | 349781417030375472-2407080017-1 |
| 3rd row | 7157250219775883918-2407072120-25 |
| 4th row | 349781417030375472-2407080017-2 |
| 5th row | 1983158592123451570-2407080010-3 |
| Value | Count | Frequency (%) |
| 45229876672715008-2407111044-6 | 2 | < 0.1% |
| 8093000697463236928-2407102014-13 | 2 | < 0.1% |
| 2684232165437648261-2407131705-7 | 2 | < 0.1% |
| 1131061763278260844-2407092238-8 | 2 | < 0.1% |
| 4058401902925227636-2407091905-26 | 2 | < 0.1% |
| 3908068791773353441-2407092106-7 | 2 | < 0.1% |
| 2156867993929598961-2407121140-7 | 2 | < 0.1% |
| 5274685741865861980-2407120835-6 | 2 | < 0.1% |
| 4720559371039705295-2407080045-2 | 2 | < 0.1% |
| 7755496932300555138-2407121153-4 | 2 | < 0.1% |
| Other values (2026575) | 2057972 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 8482941 | |
| 0 | 8372506 | |
| 2 | 7783856 | |
| 4 | 7196516 | |
| 7 | 6513910 | |
| 3 | 5323158 | |
| - | 5149715 | |
| 8 | 4902097 | |
| 5 | 4832920 | |
| 6 | 4523564 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 67600313 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 8482941 | |
| 0 | 8372506 | |
| 2 | 7783856 | |
| 4 | 7196516 | |
| 7 | 6513910 | |
| 3 | 5323158 | |
| - | 5149715 | |
| 8 | 4902097 | |
| 5 | 4832920 | |
| 6 | 4523564 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 67600313 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 8482941 | |
| 0 | 8372506 | |
| 2 | 7783856 | |
| 4 | 7196516 | |
| 7 | 6513910 | |
| 3 | 5323158 | |
| - | 5149715 | |
| 8 | 4902097 | |
| 5 | 4832920 | |
| 6 | 4523564 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 67600313 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 8482941 | |
| 0 | 8372506 | |
| 2 | 7783856 | |
| 4 | 7196516 | |
| 7 | 6513910 | |
| 3 | 5323158 | |
| - | 5149715 | |
| 8 | 4902097 | |
| 5 | 4832920 | |
| 6 | 4523564 |
line
Text
| Distinct | 296 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
Length
| Max length | 5 |
|---|---|
| Median length | 1 |
| Mean length | 1.6071987 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 20 |
|---|---|
| 2nd row | 18 |
| 3rd row | 1 |
| 4th row | 18 |
| 5th row | 33 |
| Value | Count | Frequency (%) |
| 1 | 327015 | 15.9% |
| 2 | 170604 | 8.3% |
| 3 | 167018 | 8.1% |
| 6 | 117296 | 5.7% |
| 5 | 102742 | 5.0% |
| 8 | 98742 | 4.8% |
| 7 | 75469 | 3.7% |
| 4 | 71594 | 3.5% |
| 9 | 65811 | 3.2% |
| 42 | 42352 | 2.1% |
| Other values (284) | 819366 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 642474 | |
| 2 | 400714 | |
| 3 | 316028 | |
| 4 | 283504 | |
| 5 | 272022 | |
| 6 | 263607 | |
| 8 | 206698 | 6.2% |
| 7 | 195654 | 5.9% |
| R | 194509 | 5.9% |
| 9 | 139102 | 4.2% |
| Other values (23) | 393290 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3307602 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 642474 | |
| 2 | 400714 | |
| 3 | 316028 | |
| 4 | 283504 | |
| 5 | 272022 | |
| 6 | 263607 | |
| 8 | 206698 | 6.2% |
| 7 | 195654 | 5.9% |
| R | 194509 | 5.9% |
| 9 | 139102 | 4.2% |
| Other values (23) | 393290 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3307602 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 642474 | |
| 2 | 400714 | |
| 3 | 316028 | |
| 4 | 283504 | |
| 5 | 272022 | |
| 6 | 263607 | |
| 8 | 206698 | 6.2% |
| 7 | 195654 | 5.9% |
| R | 194509 | 5.9% |
| 9 | 139102 | 4.2% |
| Other values (23) | 393290 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3307602 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 642474 | |
| 2 | 400714 | |
| 3 | 316028 | |
| 4 | 283504 | |
| 5 | 272022 | |
| 6 | 263607 | |
| 8 | 206698 | 6.2% |
| 7 | 195654 | 5.9% |
| R | 194509 | 5.9% |
| 9 | 139102 | 4.2% |
| Other values (23) | 393290 |
path
Text
Missing 
| Distinct | 22142 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 211069 |
| Missing (%) | 10.3% |
| Memory size | 15.7 MiB |
Length
| Max length | 1229 |
|---|---|
| Median length | 626 |
| Mean length | 181.46272 |
| Min length | 4 |
Unique
| Unique | 1121 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Stolberg(Rheinl)Hbf Gl.44|Eschweiler-St.Jöris|Alsdorf Poststraße|Alsdorf-Mariadorf|Alsdorf-Kellersberg|Alsdorf-Annapark|Alsdorf-Busch|Herzogenrath-August-Schmidt-Platz|Herzogenrath-Alt-Merkstein|Herzogenrath|Kohlscheid|Aachen West|Aachen Schanz |
|---|---|
| 2nd row | Hamm(Westf)Hbf|Kamen|Kamen-Methler|Dortmund-Kurl|Dortmund-Scharnhorst|Dortmund Hbf|Bochum Hbf|Wattenscheid|Essen Hbf|Mülheim(Ruhr)Hbf|Duisburg Hbf|Düsseldorf Flughafen|Düsseldorf Hbf|Düsseldorf-Benrath|Leverkusen Mitte|Köln-Mülheim|Köln Messe/Deutz|Köln Hbf|Köln-Ehrenfeld|Horrem|Düren|Langerwehe|Eschweiler Hbf|Stolberg(Rheinl)Hbf |
| 3rd row | Aachen Hbf |
| 4th row | Herzogenrath|Kohlscheid |
| 5th row | Herzogenrath |
| Value | Count | Frequency (%) |
| hbf | 456103 | 3.4% |
| s)|berlin | 449124 | 3.4% |
| allee|berlin | 163190 | 1.2% |
| berlin | 111135 | 0.8% |
| friedrichstraße | 95783 | 0.7% |
| straße|berlin | 90152 | 0.7% |
| am | 89413 | 0.7% |
| flughafen | 86299 | 0.7% |
| ostkreuz | 85078 | 0.6% |
| rosenheimer | 81764 | 0.6% |
| Other values (13355) | 11538441 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 35962727 | 10.7% |
| r | 26391616 | 7.9% |
| n | 25364913 | 7.6% |
| a | 17811571 | 5.3% |
| | | 17622361 | 5.3% |
| i | 15137595 | 4.5% |
| l | 15100658 | 4.5% |
| t | 14035768 | 4.2% |
| s | 12308835 | 3.7% |
| h | 11778931 | 3.5% |
| Other values (67) | 143632691 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 335147666 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 35962727 | 10.7% |
| r | 26391616 | 7.9% |
| n | 25364913 | 7.6% |
| a | 17811571 | 5.3% |
| | | 17622361 | 5.3% |
| i | 15137595 | 4.5% |
| l | 15100658 | 4.5% |
| t | 14035768 | 4.2% |
| s | 12308835 | 3.7% |
| h | 11778931 | 3.5% |
| Other values (67) | 143632691 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 335147666 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 35962727 | 10.7% |
| r | 26391616 | 7.9% |
| n | 25364913 | 7.6% |
| a | 17811571 | 5.3% |
| | | 17622361 | 5.3% |
| i | 15137595 | 4.5% |
| l | 15100658 | 4.5% |
| t | 14035768 | 4.2% |
| s | 12308835 | 3.7% |
| h | 11778931 | 3.5% |
| Other values (67) | 143632691 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 335147666 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 35962727 | 10.7% |
| r | 26391616 | 7.9% |
| n | 25364913 | 7.6% |
| a | 17811571 | 5.3% |
| | | 17622361 | 5.3% |
| i | 15137595 | 4.5% |
| l | 15100658 | 4.5% |
| t | 14035768 | 4.2% |
| s | 12308835 | 3.7% |
| h | 11778931 | 3.5% |
| Other values (67) | 143632691 |
eva_nr
Real number (ℝ)
High correlation 
| Distinct | 1996 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8018251.2 |
| Minimum | 8000001 |
|---|---|
| Maximum | 8098360 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 8000001 |
|---|---|
| 5-th percentile | 8000105 |
| Q1 | 8001582 |
| median | 8004136 |
| Q3 | 8010207 |
| 95-th percentile | 8089080 |
| Maximum | 8098360 |
| Range | 98359 |
| Interquartile range (IQR) | 8625 |
Descriptive statistics
| Standard deviation | 31775.008 |
|---|---|
| Coefficient of variation (CV) | 0.0039628352 |
| Kurtosis | 1.0165251 |
| Mean | 8018251.2 |
| Median Absolute Deviation (MAD) | 3054 |
| Skewness | 1.7137405 |
| Sum | 1.6501497 × 1013 |
| Variance | 1.0096511 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8004128 | 8732 | 0.4% |
| 8089047 | 8245 | 0.4% |
| 8000262 | 7814 | 0.4% |
| 8004132 | 7598 | 0.4% |
| 8004131 | 7382 | 0.4% |
| 8004135 | 7378 | 0.4% |
| 8004129 | 7366 | 0.4% |
| 8004136 | 7324 | 0.4% |
| 8089045 | 7085 | 0.3% |
| 8003368 | 6828 | 0.3% |
| Other values (1986) | 1982240 |
| Value | Count | Frequency (%) |
| 8000001 | 1488 | |
| 8000002 | 823 | < 0.1% |
| 8000004 | 848 | < 0.1% |
| 8000007 | 591 | < 0.1% |
| 8000009 | 829 | < 0.1% |
| 8000010 | 946 | |
| 8000011 | 589 | < 0.1% |
| 8000012 | 896 | < 0.1% |
| 8000013 | 2337 | |
| 8000014 | 756 | < 0.1% |
| Value | Count | Frequency (%) |
| 8098360 | 531 | < 0.1% |
| 8089537 | 2180 | 0.1% |
| 8089474 | 5831 | |
| 8089473 | 1530 | 0.1% |
| 8089472 | 1538 | 0.1% |
| 8089331 | 1678 | 0.1% |
| 8089330 | 1898 | 0.1% |
| 8089329 | 1787 | 0.1% |
| 8089328 | 1916 | 0.1% |
| 8089327 | 2763 |
category
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
| 4 | |
|---|---|
| 5 | |
| 3 | |
| 2 | |
| 1 | 70715 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 2 |
| 3rd row | 4 |
| 4th row | 5 |
| 5th row | 5 |
Common Values
| Value | Count | Frequency (%) |
| 4 | 786721 | |
| 5 | 642557 | |
| 3 | 420922 | |
| 2 | 137077 | 6.7% |
| 1 | 70715 | 3.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 4 | 786721 | |
| 5 | 642557 | |
| 3 | 420922 | |
| 2 | 137077 | 6.7% |
| 1 | 70715 | 3.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 4 | 786721 | |
| 5 | 642557 | |
| 3 | 420922 | |
| 2 | 137077 | 6.7% |
| 1 | 70715 | 3.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2057992 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 4 | 786721 | |
| 5 | 642557 | |
| 3 | 420922 | |
| 2 | 137077 | 6.7% |
| 1 | 70715 | 3.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2057992 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 4 | 786721 | |
| 5 | 642557 | |
| 3 | 420922 | |
| 2 | 137077 | 6.7% |
| 1 | 70715 | 3.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2057992 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 4 | 786721 | |
| 5 | 642557 | |
| 3 | 420922 | |
| 2 | 137077 | 6.7% |
| 1 | 70715 | 3.4% |
station
Text
| Distinct | 1996 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
Length
| Max length | 42 |
|---|---|
| Median length | 30 |
| Mean length | 14.651397 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Aachen Hbf |
|---|---|
| 2nd row | Aachen Hbf |
| 3rd row | Aachen-Rothe Erde |
| 4th row | Aachen West |
| 5th row | Aachen West |
| Value | Count | Frequency (%) |
| hbf | 186959 | 5.9% |
| münchen | 63047 | 2.0% |
| main | 62404 | 2.0% |
| frankfurt | 54552 | 1.7% |
| straße | 39315 | 1.2% |
| berlin | 34038 | 1.1% |
| stuttgart | 27209 | 0.9% |
| bad | 27077 | 0.8% |
| köln | 25766 | 0.8% |
| ost | 25294 | 0.8% |
| Other values (2079) | 2639997 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 3523672 | 11.7% |
| r | 2329597 | 7.7% |
| n | 2327081 | 7.7% |
| a | 1803803 | 6.0% |
| t | 1456625 | 4.8% |
| i | 1322166 | 4.4% |
| l | 1289613 | 4.3% |
| s | 1286458 | 4.3% |
| h | 1199314 | 4.0% |
| 1127666 | 3.7% | |
| Other values (53) | 12486462 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 30152457 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 3523672 | 11.7% |
| r | 2329597 | 7.7% |
| n | 2327081 | 7.7% |
| a | 1803803 | 6.0% |
| t | 1456625 | 4.8% |
| i | 1322166 | 4.4% |
| l | 1289613 | 4.3% |
| s | 1286458 | 4.3% |
| h | 1199314 | 4.0% |
| 1127666 | 3.7% | |
| Other values (53) | 12486462 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 30152457 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 3523672 | 11.7% |
| r | 2329597 | 7.7% |
| n | 2327081 | 7.7% |
| a | 1803803 | 6.0% |
| t | 1456625 | 4.8% |
| i | 1322166 | 4.4% |
| l | 1289613 | 4.3% |
| s | 1286458 | 4.3% |
| h | 1199314 | 4.0% |
| 1127666 | 3.7% | |
| Other values (53) | 12486462 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 30152457 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 3523672 | 11.7% |
| r | 2329597 | 7.7% |
| n | 2327081 | 7.7% |
| a | 1803803 | 6.0% |
| t | 1456625 | 4.8% |
| i | 1322166 | 4.4% |
| l | 1289613 | 4.3% |
| s | 1286458 | 4.3% |
| h | 1199314 | 4.0% |
| 1127666 | 3.7% | |
| Other values (53) | 12486462 |
state
Categorical
High correlation 
| Distinct | 17 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 MiB |
| Nordrhein-Westfalen | |
|---|---|
| Berlin | |
| Bayern | |
| Baden-Württemberg | |
| Hessen | |
| Other values (12) |
Length
| Max length | 22 |
|---|---|
| Median length | 19 |
| Mean length | 10.957535 |
| Min length | 6 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Nordrhein-Westfalen |
|---|---|
| 2nd row | Nordrhein-Westfalen |
| 3rd row | Nordrhein-Westfalen |
| 4th row | Nordrhein-Westfalen |
| 5th row | Nordrhein-Westfalen |
Common Values
| Value | Count | Frequency (%) |
| Nordrhein-Westfalen | 342558 | |
| Berlin | 334037 | |
| Bayern | 329968 | |
| Baden-Württemberg | 252714 | |
| Hessen | 200022 | |
| Hamburg | 154711 | |
| Sachsen | 84676 | 4.1% |
| Niedersachsen | 82602 | 4.0% |
| Rheinland-Pfalz | 78824 | 3.8% |
| Brandenburg | 58863 | 2.9% |
| Other values (7) | 139017 |
Length
| Value | Count | Frequency (%) |
| nordrhein-westfalen | 342558 | |
| berlin | 334037 | |
| bayern | 329968 | |
| baden-württemberg | 252714 | |
| hessen | 200022 | |
| hamburg | 154711 | |
| sachsen | 84676 | 4.1% |
| niedersachsen | 82602 | 4.0% |
| rheinland-pfalz | 78824 | 3.8% |
| brandenburg | 58863 | 2.9% |
| Other values (7) | 139017 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 3534415 | |
| n | 2458846 | 10.9% |
| r | 2327711 | 10.3% |
| a | 1577652 | 7.0% |
| s | 1095569 | 4.9% |
| B | 986010 | 4.4% |
| l | 977680 | 4.3% |
| i | 931200 | 4.1% |
| t | 915064 | 4.1% |
| d | 832820 | 3.7% |
| Other values (25) | 6913553 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 22550520 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 3534415 | |
| n | 2458846 | 10.9% |
| r | 2327711 | 10.3% |
| a | 1577652 | 7.0% |
| s | 1095569 | 4.9% |
| B | 986010 | 4.4% |
| l | 977680 | 4.3% |
| i | 931200 | 4.1% |
| t | 915064 | 4.1% |
| d | 832820 | 3.7% |
| Other values (25) | 6913553 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 22550520 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 3534415 | |
| n | 2458846 | 10.9% |
| r | 2327711 | 10.3% |
| a | 1577652 | 7.0% |
| s | 1095569 | 4.9% |
| B | 986010 | 4.4% |
| l | 977680 | 4.3% |
| i | 931200 | 4.1% |
| t | 915064 | 4.1% |
| d | 832820 | 3.7% |
| Other values (25) | 6913553 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 22550520 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 3534415 | |
| n | 2458846 | 10.9% |
| r | 2327711 | 10.3% |
| a | 1577652 | 7.0% |
| s | 1095569 | 4.9% |
| B | 986010 | 4.4% |
| l | 977680 | 4.3% |
| i | 931200 | 4.1% |
| t | 915064 | 4.1% |
| d | 832820 | 3.7% |
| Other values (25) | 6913553 |
city
Text
| Distinct | 1292 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Memory size | 15.7 MiB |
Length
| Max length | 25 |
|---|---|
| Median length | 23 |
| Mean length | 8.9922755 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Aachen |
|---|---|
| 2nd row | Aachen |
| 3rd row | Aachen |
| 4th row | Aachen |
| 5th row | Aachen |
| Value | Count | Frequency (%) |
| berlin | 335199 | 13.8% |
| hamburg | 154711 | 6.4% |
| münchen | 118039 | 4.9% |
| main | 87651 | 3.6% |
| am | 82299 | 3.4% |
| frankfurt | 69186 | 2.9% |
| köln | 42863 | 1.8% |
| stuttgart | 41450 | 1.7% |
| düsseldorf | 38327 | 1.6% |
| bad | 28394 | 1.2% |
| Other values (1345) | 1424922 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2101384 | 11.4% |
| n | 1842040 | 10.0% |
| r | 1620113 | 8.8% |
| a | 1150130 | 6.2% |
| i | 1086512 | 5.9% |
| l | 883160 | 4.8% |
| t | 711947 | 3.8% |
| u | 687371 | 3.7% |
| h | 638318 | 3.4% |
| g | 635350 | 3.4% |
| Other values (50) | 7149697 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 18506022 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2101384 | 11.4% |
| n | 1842040 | 10.0% |
| r | 1620113 | 8.8% |
| a | 1150130 | 6.2% |
| i | 1086512 | 5.9% |
| l | 883160 | 4.8% |
| t | 711947 | 3.8% |
| u | 687371 | 3.7% |
| h | 638318 | 3.4% |
| g | 635350 | 3.4% |
| Other values (50) | 7149697 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 18506022 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2101384 | 11.4% |
| n | 1842040 | 10.0% |
| r | 1620113 | 8.8% |
| a | 1150130 | 6.2% |
| i | 1086512 | 5.9% |
| l | 883160 | 4.8% |
| t | 711947 | 3.8% |
| u | 687371 | 3.7% |
| h | 638318 | 3.4% |
| g | 635350 | 3.4% |
| Other values (50) | 7149697 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 18506022 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2101384 | 11.4% |
| n | 1842040 | 10.0% |
| r | 1620113 | 8.8% |
| a | 1150130 | 6.2% |
| i | 1086512 | 5.9% |
| l | 883160 | 4.8% |
| t | 711947 | 3.8% |
| u | 687371 | 3.7% |
| h | 638318 | 3.4% |
| g | 635350 | 3.4% |
| Other values (50) | 7149697 |
zip
Real number (ℝ)
High correlation 
| Distinct | 1651 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 46284.135 |
| Minimum | 1067 |
|---|---|
| Maximum | 99974 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 1067 |
|---|---|
| 5-th percentile | 7745 |
| Q1 | 18119 |
| median | 47051 |
| Q3 | 70806 |
| 95-th percentile | 88427 |
| Maximum | 99974 |
| Range | 98907 |
| Interquartile range (IQR) | 52687 |
Descriptive statistics
| Standard deviation | 28213.241 |
|---|---|
| Coefficient of variation (CV) | 0.60956614 |
| Kurtosis | -1.3679508 |
| Mean | 46284.135 |
| Median Absolute Deviation (MAD) | 26198 |
| Skewness | 0.045418452 |
| Sum | 9.5252333 × 1010 |
| Variance | 7.9598699 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 80331 | 22358 | 1.1% |
| 80639 | 13196 | 0.6% |
| 10557 | 12961 | 0.6% |
| 14057 | 11885 | 0.6% |
| 10827 | 11676 | 0.6% |
| 10117 | 11643 | 0.6% |
| 60313 | 11368 | 0.6% |
| 22525 | 9877 | 0.5% |
| 10317 | 9631 | 0.5% |
| 20354 | 9614 | 0.5% |
| Other values (1641) | 1933782 |
| Value | Count | Frequency (%) |
| 1067 | 2458 | |
| 1069 | 2045 | |
| 1097 | 3301 | |
| 1109 | 1799 | |
| 1127 | 597 | < 0.1% |
| 1129 | 1882 | |
| 1159 | 982 | < 0.1% |
| 1187 | 566 | < 0.1% |
| 1219 | 917 | < 0.1% |
| 1237 | 1944 |
| Value | Count | Frequency (%) |
| 99974 | 421 | |
| 99947 | 453 | |
| 99880 | 424 | |
| 99867 | 494 | |
| 99817 | 453 | |
| 99752 | 252 | |
| 99734 | 496 | |
| 99610 | 353 | |
| 99518 | 279 | |
| 99510 | 360 |
long
Real number (ℝ)
High correlation 
| Distinct | 1995 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.183157 |
| Minimum | 6.070715 |
|---|---|
| Maximum | 14.97908 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 6.070715 |
|---|---|
| 5-th percentile | 6.815137 |
| Q1 | 8.494709 |
| median | 9.944088 |
| Q3 | 12.090548 |
| 95-th percentile | 13.513799 |
| Maximum | 14.97908 |
| Range | 8.908365 |
| Interquartile range (IQR) | 3.595839 |
Descriptive statistics
| Standard deviation | 2.2735233 |
|---|---|
| Coefficient of variation (CV) | 0.2232631 |
| Kurtosis | -1.2260254 |
| Mean | 10.183157 |
| Median Absolute Deviation (MAD) | 1.694189 |
| Skewness | 0.11328713 |
| Sum | 20956846 |
| Variance | 5.1689083 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11.536537 | 8732 | 0.4% |
| 13.283966 | 8245 | 0.4% |
| 11.604971 | 7814 | 0.4% |
| 11.565619 | 7598 | 0.4% |
| 11.583234 | 7382 | 0.4% |
| 11.575386 | 7378 | 0.4% |
| 11.548572 | 7366 | 0.4% |
| 11.593049 | 7324 | 0.4% |
| 13.451646 | 7085 | 0.3% |
| 6.975001 | 6828 | 0.3% |
| Other values (1985) | 1982239 |
| Value | Count | Frequency (%) |
| 6.070715 | 1744 | |
| 6.07384 | 1206 | |
| 6.074485 | 1051 | |
| 6.091499 | 1488 | |
| 6.094486 | 1899 | |
| 6.097265 | 815 | |
| 6.116475 | 949 | |
| 6.124518 | 818 | |
| 6.203225 | 252 | < 0.1% |
| 6.207467 | 717 | < 0.1% |
| Value | Count | Frequency (%) |
| 14.97908 | 608 | |
| 14.902088 | 272 | < 0.1% |
| 14.805774 | 577 | |
| 14.706775 | 348 | |
| 14.671941 | 461 | |
| 14.658435 | 480 | |
| 14.648866 | 264 | < 0.1% |
| 14.638027 | 266 | < 0.1% |
| 14.578802 | 280 | < 0.1% |
| 14.546496 | 716 |
lat
Real number (ℝ)
High correlation 
| Distinct | 1996 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.882081 |
| Minimum | 47.411032 |
|---|---|
| Maximum | 54.906839 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 47.411032 |
|---|---|
| 5-th percentile | 48.111413 |
| Q1 | 49.353291 |
| median | 51.087456 |
| Q3 | 52.478542 |
| 95-th percentile | 53.564711 |
| Maximum | 54.906839 |
| Range | 7.495807 |
| Interquartile range (IQR) | 3.125251 |
Descriptive statistics
| Standard deviation | 1.7921961 |
|---|---|
| Coefficient of variation (CV) | 0.035222539 |
| Kurtosis | -1.1316864 |
| Mean | 50.882081 |
| Median Absolute Deviation (MAD) | 1.42371 |
| Skewness | -0.1180777 |
| Sum | 1.0471487 × 108 |
| Variance | 3.2119669 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48.142623 | 8732 | 0.4% |
| 52.500737 | 8245 | 0.4% |
| 48.12744 | 7814 | 0.4% |
| 48.139452 | 7598 | 0.4% |
| 48.134202 | 7382 | 0.4% |
| 48.137048 | 7378 | 0.4% |
| 48.141969 | 7366 | 0.4% |
| 48.129168 | 7324 | 0.4% |
| 52.505976 | 7085 | 0.3% |
| 50.940874 | 6828 | 0.3% |
| Other values (1986) | 1982239 |
| Value | Count | Frequency (%) |
| 47.411032 | 220 | < 0.1% |
| 47.44003 | 237 | < 0.1% |
| 47.456591 | 449 | |
| 47.491452 | 419 | |
| 47.513241 | 472 | |
| 47.544341 | 565 | |
| 47.5509 | 368 | |
| 47.552384 | 874 | |
| 47.555857 | 484 | |
| 47.556923 | 608 |
| Value | Count | Frequency (%) |
| 54.906839 | 233 | |
| 54.888814 | 364 | |
| 54.872142 | 371 | |
| 54.861997 | 381 | |
| 54.789605 | 563 | |
| 54.774039 | 281 | |
| 54.685934 | 373 | |
| 54.621166 | 311 | |
| 54.499457 | 515 | |
| 54.4720826 | 537 |
arrival_plan
Date
Missing 
| Distinct | 10084 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 211069 |
| Missing (%) | 10.3% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-07 23:37:00 |
|---|---|
| Maximum | 2024-07-14 23:59:00 |
departure_plan
Date
| Distinct | 10089 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-08 00:00:00 |
|---|---|
| Maximum | 2024-07-15 00:10:00 |
arrival_change
Date
Missing 
| Distinct | 10114 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 474922 |
| Missing (%) | 23.1% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-07 23:39:00 |
|---|---|
| Maximum | 2024-07-15 01:03:00 |
departure_change
Date
Missing 
| Distinct | 10108 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 339378 |
| Missing (%) | 16.5% |
| Memory size | 15.7 MiB |
| Minimum | 2024-07-08 00:00:00 |
|---|---|
| Maximum | 2024-07-15 01:04:00 |
arrival_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 116 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.1767087 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1404435 |
| Zeros (%) | 68.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 6 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.4052415 |
|---|---|
| Coefficient of variation (CV) | 2.8938695 |
| Kurtosis | 106.92896 |
| Mean | 1.1767087 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.667469 |
| Sum | 2421656 |
| Variance | 11.59567 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1404435 | |
| 1 | 254614 | 12.4% |
| 2 | 130275 | 6.3% |
| 3 | 80385 | 3.9% |
| 4 | 46057 | 2.2% |
| 5 | 31418 | 1.5% |
| 6 | 21921 | 1.1% |
| 7 | 15746 | 0.8% |
| 8 | 12240 | 0.6% |
| 9 | 9843 | 0.5% |
| Other values (106) | 51057 | 2.5% |
| Value | Count | Frequency (%) |
| 0 | 1404435 | |
| 1 | 254614 | 12.4% |
| 2 | 130275 | 6.3% |
| 3 | 80385 | 3.9% |
| 4 | 46057 | 2.2% |
| 5 | 31418 | 1.5% |
| 6 | 21921 | 1.1% |
| 7 | 15746 | 0.8% |
| 8 | 12240 | 0.6% |
| 9 | 9843 | 0.5% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 157 | 2 | |
| 140 | 1 | < 0.1% |
| 136 | 1 | < 0.1% |
| 134 | 1 | < 0.1% |
| 133 | 3 | |
| 132 | 1 | < 0.1% |
| 120 | 1 | < 0.1% |
| 117 | 1 | < 0.1% |
| 116 | 3 |
departure_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 121 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.2236404 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1335764 |
| Zeros (%) | 64.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 15.7 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 6 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.4155783 |
|---|---|
| Coefficient of variation (CV) | 2.7913251 |
| Kurtosis | 107.07563 |
| Mean | 1.2236404 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.6630691 |
| Sum | 2518241 |
| Variance | 11.666175 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1335764 | |
| 1 | 305641 | 14.9% |
| 2 | 146180 | 7.1% |
| 3 | 81436 | 4.0% |
| 4 | 46490 | 2.3% |
| 5 | 31286 | 1.5% |
| 6 | 21722 | 1.1% |
| 7 | 15693 | 0.8% |
| 8 | 12271 | 0.6% |
| 9 | 9764 | 0.5% |
| Other values (111) | 51744 | 2.5% |
| Value | Count | Frequency (%) |
| 0 | 1335764 | |
| 1 | 305641 | 14.9% |
| 2 | 146180 | 7.1% |
| 3 | 81436 | 4.0% |
| 4 | 46490 | 2.3% |
| 5 | 31286 | 1.5% |
| 6 | 21722 | 1.1% |
| 7 | 15693 | 0.8% |
| 8 | 12271 | 0.6% |
| 9 | 9764 | 0.5% |
| Value | Count | Frequency (%) |
| 159 | 1 | |
| 157 | 1 | |
| 156 | 1 | |
| 137 | 1 | |
| 135 | 1 | |
| 134 | 2 | |
| 133 | 1 | |
| 132 | 2 | |
| 131 | 1 | |
| 120 | 1 |
info
Categorical
High correlation  Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1414133 |
| Missing (%) | 68.7% |
| Memory size | 15.7 MiB |
| Information | |
|---|---|
| Störung | |
| Bauarbeiten | |
| Information. (Quelle: zuginfo.nrw) | |
| Bauarbeiten. (Quelle: zuginfo.nrw) | |
| Other values (2) |
Length
| Max length | 34 |
|---|---|
| Median length | 11 |
| Mean length | 16.535776 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Bauarbeiten. (Quelle: zuginfo.nrw) |
|---|---|
| 2nd row | Information |
| 3rd row | Information |
| 4th row | Information |
| 5th row | Information |
Common Values
| Value | Count | Frequency (%) |
| Information | 243525 | 11.8% |
| Störung | 115698 | 5.6% |
| Bauarbeiten | 96154 | 4.7% |
| Information. (Quelle: zuginfo.nrw) | 78925 | 3.8% |
| Bauarbeiten. (Quelle: zuginfo.nrw) | 72472 | 3.5% |
| Störung. (Quelle: zuginfo.nrw) | 28680 | 1.4% |
| Großstörung | 8405 | 0.4% |
| (Missing) | 1414133 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| information | 322450 | |
| quelle | 180077 | |
| zuginfo.nrw | 180077 | |
| bauarbeiten | 168626 | |
| störung | 144378 | |
| großstörung | 8405 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 1326463 | 12.5% |
| o | 833382 | 7.8% |
| r | 832341 | 7.8% |
| e | 697406 | 6.6% |
| u | 681563 | 6.4% |
| i | 671153 | 6.3% |
| a | 659702 | 6.2% |
| t | 643859 | 6.0% |
| f | 502527 | 4.7% |
| l | 360154 | 3.4% |
| Other values (18) | 3438158 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 10646708 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 1326463 | 12.5% |
| o | 833382 | 7.8% |
| r | 832341 | 7.8% |
| e | 697406 | 6.6% |
| u | 681563 | 6.4% |
| i | 671153 | 6.3% |
| a | 659702 | 6.2% |
| t | 643859 | 6.0% |
| f | 502527 | 4.7% |
| l | 360154 | 3.4% |
| Other values (18) | 3438158 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 10646708 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 1326463 | 12.5% |
| o | 833382 | 7.8% |
| r | 832341 | 7.8% |
| e | 697406 | 6.6% |
| u | 681563 | 6.4% |
| i | 671153 | 6.3% |
| a | 659702 | 6.2% |
| t | 643859 | 6.0% |
| f | 502527 | 4.7% |
| l | 360154 | 3.4% |
| Other values (18) | 3438158 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 10646708 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 1326463 | 12.5% |
| o | 833382 | 7.8% |
| r | 832341 | 7.8% |
| e | 697406 | 6.6% |
| u | 681563 | 6.4% |
| i | 671153 | 6.3% |
| a | 659702 | 6.2% |
| t | 643859 | 6.0% |
| f | 502527 | 4.7% |
| l | 360154 | 3.4% |
| Other values (18) | 3438158 |
arrival_delay_check
Categorical
High correlation  Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Memory size | 15.7 MiB |
| on_time | |
|---|---|
| delay | 110807 |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 6.8923154 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | on_time |
|---|---|
| 2nd row | on_time |
| 3rd row | on_time |
| 4th row | on_time |
| 5th row | on_time |
Common Values
| Value | Count | Frequency (%) |
| on_time | 1947184 | |
| delay | 110807 | 5.4% |
| (Missing) | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| on_time | 1947184 | |
| delay | 110807 | 5.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1947184 | |
| n | 1947184 | |
| _ | 1947184 | |
| t | 1947184 | |
| i | 1947184 | |
| m | 1947184 | |
| d | 110807 | 0.8% |
| l | 110807 | 0.8% |
| a | 110807 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 14184323 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1947184 | |
| n | 1947184 | |
| _ | 1947184 | |
| t | 1947184 | |
| i | 1947184 | |
| m | 1947184 | |
| d | 110807 | 0.8% |
| l | 110807 | 0.8% |
| a | 110807 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 14184323 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1947184 | |
| n | 1947184 | |
| _ | 1947184 | |
| t | 1947184 | |
| i | 1947184 | |
| m | 1947184 | |
| d | 110807 | 0.8% |
| l | 110807 | 0.8% |
| a | 110807 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 14184323 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1947184 | |
| n | 1947184 | |
| _ | 1947184 | |
| t | 1947184 | |
| i | 1947184 | |
| m | 1947184 | |
| d | 110807 | 0.8% |
| l | 110807 | 0.8% |
| a | 110807 | 0.8% |
departure_delay_check
Categorical
High correlation  Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Memory size | 15.7 MiB |
| on_time | |
|---|---|
| delay | 111194 |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 6.8919393 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | on_time |
|---|---|
| 2nd row | on_time |
| 3rd row | on_time |
| 4th row | on_time |
| 5th row | on_time |
Common Values
| Value | Count | Frequency (%) |
| on_time | 1946797 | |
| delay | 111194 | 5.4% |
| (Missing) | 1 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| on_time | 1946797 | |
| delay | 111194 | 5.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1946797 | |
| n | 1946797 | |
| _ | 1946797 | |
| t | 1946797 | |
| i | 1946797 | |
| m | 1946797 | |
| d | 111194 | 0.8% |
| l | 111194 | 0.8% |
| a | 111194 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 14183549 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1946797 | |
| n | 1946797 | |
| _ | 1946797 | |
| t | 1946797 | |
| i | 1946797 | |
| m | 1946797 | |
| d | 111194 | 0.8% |
| l | 111194 | 0.8% |
| a | 111194 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 14183549 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1946797 | |
| n | 1946797 | |
| _ | 1946797 | |
| t | 1946797 | |
| i | 1946797 | |
| m | 1946797 | |
| d | 111194 | 0.8% |
| l | 111194 | 0.8% |
| a | 111194 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 14183549 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2057991 | |
| o | 1946797 | |
| n | 1946797 | |
| _ | 1946797 | |
| t | 1946797 | |
| i | 1946797 | |
| m | 1946797 | |
| d | 111194 | 0.8% |
| l | 111194 | 0.8% |
| a | 111194 | 0.8% |
Interactions
Correlations
| arrival_delay_check | arrival_delay_m | category | departure_delay_check | departure_delay_m | eva_nr | info | lat | long | state | zip | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| arrival_delay_check | 1.000 | 0.432 | 0.031 | 0.910 | 0.414 | 0.086 | 0.086 | 0.096 | 0.102 | 0.114 | 0.107 |
| arrival_delay_m | 0.432 | 1.000 | 0.010 | 0.428 | 0.824 | -0.086 | 0.027 | -0.251 | -0.105 | 0.019 | 0.226 |
| category | 0.031 | 0.010 | 1.000 | 0.032 | 0.011 | 0.163 | 0.127 | 0.176 | 0.175 | 0.209 | 0.158 |
| departure_delay_check | 0.910 | 0.428 | 0.032 | 1.000 | 0.434 | 0.087 | 0.085 | 0.096 | 0.102 | 0.114 | 0.107 |
| departure_delay_m | 0.414 | 0.824 | 0.011 | 0.434 | 1.000 | -0.094 | 0.027 | -0.270 | -0.111 | 0.019 | 0.245 |
| eva_nr | 0.086 | -0.086 | 0.163 | 0.087 | -0.094 | 1.000 | 0.313 | 0.348 | 0.654 | 0.706 | -0.530 |
| info | 0.086 | 0.027 | 0.127 | 0.085 | 0.027 | 0.313 | 1.000 | 0.515 | 0.563 | 0.577 | 0.540 |
| lat | 0.096 | -0.251 | 0.176 | 0.096 | -0.270 | 0.348 | 0.515 | 1.000 | 0.258 | 0.688 | -0.833 |
| long | 0.102 | -0.105 | 0.175 | 0.102 | -0.111 | 0.654 | 0.563 | 0.258 | 1.000 | 0.650 | -0.410 |
| state | 0.114 | 0.019 | 0.209 | 0.114 | 0.019 | 0.706 | 0.577 | 0.688 | 0.650 | 1.000 | 0.696 |
| zip | 0.107 | 0.226 | 0.158 | 0.107 | 0.245 | -0.530 | 0.540 | -0.833 | -0.410 | 0.696 | 1.000 |
Missing values
Sample
| ID | line | path | eva_nr | category | station | state | city | zip | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | arrival_delay_check | departure_delay_check | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1573967790757085557-2407072312-14 | 20 | Stolberg(Rheinl)Hbf Gl.44|Eschweiler-St.Jöris|Alsdorf Poststraße|Alsdorf-Mariadorf|Alsdorf-Kellersberg|Alsdorf-Annapark|Alsdorf-Busch|Herzogenrath-August-Schmidt-Platz|Herzogenrath-Alt-Merkstein|Herzogenrath|Kohlscheid|Aachen West|Aachen Schanz | 8000001 | 2 | Aachen Hbf | Nordrhein-Westfalen | Aachen | 52064.0 | 6.091499 | 50.767800 | 2024-07-08 00:00:00 | 2024-07-08 00:01:00 | 2024-07-08 00:03:00 | 2024-07-08 00:04:00 | 3.0 | 3.0 | NaN | on_time | on_time |
| 1 | 349781417030375472-2407080017-1 | 18 | NaN | 8000001 | 2 | Aachen Hbf | Nordrhein-Westfalen | Aachen | 52064.0 | 6.091499 | 50.767800 | NaN | 2024-07-08 00:17:00 | NaN | NaN | 0.0 | 0.0 | NaN | on_time | on_time |
| 2 | 7157250219775883918-2407072120-25 | 1 | Hamm(Westf)Hbf|Kamen|Kamen-Methler|Dortmund-Kurl|Dortmund-Scharnhorst|Dortmund Hbf|Bochum Hbf|Wattenscheid|Essen Hbf|Mülheim(Ruhr)Hbf|Duisburg Hbf|Düsseldorf Flughafen|Düsseldorf Hbf|Düsseldorf-Benrath|Leverkusen Mitte|Köln-Mülheim|Köln Messe/Deutz|Köln Hbf|Köln-Ehrenfeld|Horrem|Düren|Langerwehe|Eschweiler Hbf|Stolberg(Rheinl)Hbf | 8000406 | 4 | Aachen-Rothe Erde | Nordrhein-Westfalen | Aachen | 52066.0 | 6.116475 | 50.770202 | 2024-07-08 00:03:00 | 2024-07-08 00:04:00 | 2024-07-08 00:03:00 | 2024-07-08 00:04:00 | 0.0 | 0.0 | NaN | on_time | on_time |
| 3 | 349781417030375472-2407080017-2 | 18 | Aachen Hbf | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072.0 | 6.070715 | 50.780360 | 2024-07-08 00:20:00 | 2024-07-08 00:21:00 | NaN | NaN | 0.0 | 0.0 | NaN | on_time | on_time |
| 4 | 1983158592123451570-2407080010-3 | 33 | Herzogenrath|Kohlscheid | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072.0 | 6.070715 | 50.780360 | 2024-07-08 00:20:00 | 2024-07-08 00:21:00 | 2024-07-08 00:20:00 | 2024-07-08 00:21:00 | 0.0 | 0.0 | NaN | on_time | on_time |
| 5 | -5293934437045765939-2407080023-2 | 4 | Herzogenrath | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072.0 | 6.070715 | 50.780360 | 2024-07-08 00:30:00 | 2024-07-08 00:31:00 | 2024-07-08 00:30:00 | 2024-07-08 00:31:00 | 0.0 | 0.0 | Bauarbeiten. (Quelle: zuginfo.nrw) | on_time | on_time |
| 6 | 6845762881043426854-2407072357-6 | RB33 | Lindern|Geilenkirchen|Übach-Palenberg|Herzogenrath|Kohlscheid | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072.0 | 6.070715 | 50.780360 | 2024-07-08 00:58:00 | 2024-07-08 00:58:00 | NaN | NaN | 0.0 | 0.0 | NaN | on_time | on_time |
| 7 | -2100556839975301087-2407072307-13 | 18 | Liège-Guillemins|Bressoux|Vise|Eijsden|Maastricht Randwyck|Maastricht|Meerssen|Valkenburg(NL)|Heerlen|Landgraaf|Eygelshoven Markt|Herzogenrath | 8000404 | 5 | Aachen West | Nordrhein-Westfalen | Aachen | 52072.0 | 6.070715 | 50.780360 | 2024-07-08 00:37:00 | 2024-07-08 00:41:00 | 2024-07-08 00:37:00 | 2024-07-08 00:41:00 | 0.0 | 0.0 | NaN | on_time | on_time |
| 8 | -7696913984968518161-2407080037-1 | 13 | NaN | 8000002 | 3 | Aalen Hbf | Baden-Württemberg | Aalen | 73430.0 | 10.096271 | 48.841013 | NaN | 2024-07-08 00:37:00 | NaN | 2024-07-08 00:37:00 | 0.0 | 0.0 | Information | on_time | on_time |
| 9 | -6027587483204218492-2407080013-4 | 8 | Bremen Hbf|Bremen-Sebaldsbrück|Bremen-Mahndorf | 8000413 | 4 | Achim | Niedersachsen | Achim | 28832.0 | 9.030447 | 53.015990 | 2024-07-08 00:27:00 | 2024-07-08 00:27:00 | 2024-07-08 01:16:00 | 2024-07-08 01:17:00 | 49.0 | 50.0 | NaN | delay | delay |
| ID | line | path | eva_nr | category | station | state | city | zip | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | arrival_delay_check | departure_delay_check | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2057982 | -7544299009042287777-2407142237-9 | 1 | Villingen(Schwarzw)|Donaueschingen|Hüfingen Mitte|Döggingen|Unadingen|Bachheim|Löffingen|Rötenbach(Baden) | 8004331 | 5 | Neustadt (Schwarzw) | Baden-Württemberg | Neustadt | 79822.0 | 8.210883 | 47.910198 | 2024-07-14 23:27:00 | 2024-07-14 23:28:00 | 2024-07-14 23:27:00 | 2024-07-14 23:28:00 | 0.0 | 0.0 | Bauarbeiten | on_time | on_time |
| 2057983 | 6612581188908164643-2407142033-17 | 1 | Trier Hbf|Konz|Saarburg(Bz Trier)|Mettlach|Merzig(Saar)|Dillingen(Saar)|Saarlouis Hbf|Völklingen|Saarbrücken Hbf|St Ingbert|Homburg(Saar)Hbf|Landstuhl|Kaiserslautern Hbf|Hochspeyer|Lambrecht(Pfalz)|Neustadt(Weinstr)Hbf | 8004489 | 4 | Neustadt (Weinstr) Böbig | Rheinland-Pfalz | Neustadt | 67433.0 | 8.158313 | 49.354245 | 2024-07-14 23:01:00 | 2024-07-14 23:02:00 | 2024-07-14 23:02:00 | 2024-07-14 23:02:00 | 1.0 | 0.0 | NaN | on_time | on_time |
| 2057984 | 4875328236568005990-2407142300-8 | 2 | Kaiserslautern Hbf|Hochspeyer|Frankenstein(Pfalz)|Weidenthal|Neidenfels|Lambrecht(Pfalz)|Neustadt(Weinstr)Hbf | 8004489 | 4 | Neustadt (Weinstr) Böbig | Rheinland-Pfalz | Neustadt | 67433.0 | 8.158313 | 49.354245 | 2024-07-14 23:33:00 | 2024-07-14 23:34:00 | 2024-07-14 23:34:00 | 2024-07-14 23:35:00 | 1.0 | 1.0 | NaN | on_time | on_time |
| 2057985 | 788723344751923009-2407142304-4 | 2 | Schifferstadt|Böhl-Iggelheim|Haßloch(Pfalz) | 8004489 | 4 | Neustadt (Weinstr) Böbig | Rheinland-Pfalz | Neustadt | 67433.0 | 8.158313 | 49.354245 | 2024-07-14 23:18:00 | 2024-07-14 23:18:00 | 2024-07-14 23:18:00 | 2024-07-14 23:19:00 | 0.0 | 1.0 | NaN | on_time | on_time |
| 2057986 | 6612581188908164643-2407142033-16 | 1 | Trier Hbf|Konz|Saarburg(Bz Trier)|Mettlach|Merzig(Saar)|Dillingen(Saar)|Saarlouis Hbf|Völklingen|Saarbrücken Hbf|St Ingbert|Homburg(Saar)Hbf|Landstuhl|Kaiserslautern Hbf|Hochspeyer|Lambrecht(Pfalz) | 8000275 | 2 | Neustadt (Weinstr) Hbf | Rheinland-Pfalz | Neustadt | 67434.0 | 8.140757 | 49.349553 | 2024-07-14 22:56:00 | 2024-07-14 23:00:00 | 2024-07-14 22:56:00 | 2024-07-14 23:00:00 | 0.0 | 0.0 | NaN | on_time | on_time |
| 2057987 | 4875328236568005990-2407142300-7 | 2 | Kaiserslautern Hbf|Hochspeyer|Frankenstein(Pfalz)|Weidenthal|Neidenfels|Lambrecht(Pfalz) | 8000275 | 2 | Neustadt (Weinstr) Hbf | Rheinland-Pfalz | Neustadt | 67434.0 | 8.140757 | 49.349553 | 2024-07-14 23:30:00 | 2024-07-14 23:32:00 | 2024-07-14 23:30:00 | 2024-07-14 23:33:00 | 0.0 | 1.0 | NaN | on_time | on_time |
| 2057988 | 2971209219135860640-2407142336-1 | 6 | NaN | 8000275 | 2 | Neustadt (Weinstr) Hbf | Rheinland-Pfalz | Neustadt | 67434.0 | 8.140757 | 49.349553 | NaN | 2024-07-14 23:36:00 | NaN | 2024-07-14 23:36:00 | 0.0 | 0.0 | NaN | on_time | on_time |
| 2057989 | 788723344751923009-2407142304-5 | 2 | Schifferstadt|Böhl-Iggelheim|Haßloch(Pfalz)|Neustadt-Böbig | 8000275 | 2 | Neustadt (Weinstr) Hbf | Rheinland-Pfalz | Neustadt | 67434.0 | 8.140757 | 49.349553 | 2024-07-14 23:21:00 | 2024-07-14 23:30:00 | 2024-07-14 23:21:00 | 2024-07-14 23:30:00 | 0.0 | 0.0 | NaN | on_time | on_time |
| 2057990 | 8280296046192255306-2407142239-3 | 1 | Mannheim Hbf|Ludwigshafen(Rhein) Mitte | 8000275 | 2 | Neustadt (Weinstr) Hbf | Rheinland-Pfalz | Neustadt | 67434.0 | 8.140757 | 49.349553 | 2024-07-14 23:00:00 | 2024-07-14 23:02:00 | 2024-07-14 23:05:00 | 2024-07-14 23:06:00 | 5.0 | 4.0 | NaN | on_time | on_time |
| 2057991 | -2936232225014219596-2407142332-1 | S2 | NaN | 8004322 | 4 | Neustadt a Rübenberge | Nieder | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Duplicate rows
Most frequently occurring
| ID | line | path | eva_nr | category | station | state | city | zip | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | arrival_delay_check | departure_delay_check | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -1003145420136048192-2407100551-6 | 51 | Gera Hbf|Hermsdorf-Klosterlausnitz|Stadtroda|Jena-Göschwitz|Jena West | 8010366 | 2 | Weimar | Thüringen | Weimar | 99423.0 | 11.326458 | 50.991487 | 2024-07-10 06:49:00 | 2024-07-10 07:04:00 | 2024-07-10 06:51:00 | 2024-07-10 07:06:00 | 2.0 | 2.0 | NaN | on_time | on_time | 2 |
| 1 | -1008819848758697010-2407130714-22 | 6 | Grafing Bahnhof|Kirchseeon|Eglharting|Zorneding|Baldham|Vaterstetten|Haar|Gronsdorf|München-Trudering|München-Berg am Laim|München Leuchtenbergring|München Ost|München Rosenheimer Platz|München Isartor|München Marienplatz|München Karlsplatz|München Hbf (tief)|München Hackerbrücke|München Donnersbergerbrücke|München Hirschgarten|München-Laim | 8004158 | 2 | München-Pasing | Bayern | München | 81241.0 | 11.461872 | 48.149852 | 2024-07-13 07:59:00 | 2024-07-13 08:01:00 | 2024-07-13 08:02:00 | 2024-07-13 08:03:00 | 3.0 | 2.0 | NaN | on_time | on_time | 2 |
| 2 | -1009540259073221553-2407142134-10 | 6 | Starnberg|Starnberg Nord|Gauting|Stockdorf|Planegg|Gräfelfing|Lochham|München-Westkreuz|München-Pasing | 8004151 | 3 | München-Laim Pbf | Bayern | München | 80639.0 | 11.503669 | 48.144371 | 2024-07-14 21:59:00 | 2024-07-14 22:00:00 | 2024-07-14 22:01:00 | 2024-07-14 22:02:00 | 2.0 | 2.0 | Bauarbeiten | on_time | on_time | 2 |
| 3 | -1010076636343338093-2407101633-11 | 6 | Köln-Worringen|Köln-Blumenberg|Köln-Chorweiler Nord|Köln-Chorweiler|Köln Volkhovener Weg|Köln-Longerich|Köln Geldernstr./Parkgürtel|Köln-Nippes|Köln Hansaring|Köln Hbf | 8003368 | 1 | Köln Messe/Deutz | Nordrhein-Westfalen | Köln | 50679.0 | 6.975001 | 50.940874 | 2024-07-10 16:59:00 | 2024-07-10 17:00:00 | 2024-07-10 16:59:00 | 2024-07-10 17:00:00 | 0.0 | 0.0 | Information. (Quelle: zuginfo.nrw) | on_time | on_time | 2 |
| 4 | -1012813851155274121-2407111424-7 | 18 | Maastricht|Meerssen|Valkenburg(NL)|Heerlen|Landgraaf|Eygelshoven Markt | 8002806 | 3 | Herzogenrath | Nordrhein-Westfalen | Herzogenrath | 52134.0 | 6.094486 | 50.870916 | 2024-07-11 14:59:00 | 2024-07-11 15:00:00 | NaN | NaN | 0.0 | 0.0 | NaN | on_time | on_time | 2 |
| 5 | -1012813851155274121-2407141424-7 | 18 | Maastricht|Meerssen|Valkenburg(NL)|Heerlen|Landgraaf|Eygelshoven Markt | 8002806 | 3 | Herzogenrath | Nordrhein-Westfalen | Herzogenrath | 52134.0 | 6.094486 | 50.870916 | 2024-07-14 14:59:00 | 2024-07-14 15:00:00 | NaN | NaN | 0.0 | 0.0 | NaN | on_time | on_time | 2 |
| 6 | -1014485518442214187-2407080436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338.0 | 10.908698 | 50.778393 | 2024-07-08 04:57:00 | 2024-07-08 05:06:00 | 2024-07-08 04:57:00 | 2024-07-08 05:06:00 | 0.0 | 0.0 | NaN | on_time | on_time | 2 |
| 7 | -1014485518442214187-2407090436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338.0 | 10.908698 | 50.778393 | 2024-07-09 04:57:00 | 2024-07-09 05:06:00 | 2024-07-09 04:57:00 | 2024-07-09 05:06:00 | 0.0 | 0.0 | NaN | on_time | on_time | 2 |
| 8 | -1014485518442214187-2407100436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338.0 | 10.908698 | 50.778393 | 2024-07-10 04:57:00 | 2024-07-10 05:06:00 | 2024-07-10 04:57:00 | 2024-07-10 05:06:00 | 0.0 | 0.0 | NaN | on_time | on_time | 2 |
| 9 | -1014485518442214187-2407120436-7 | 46 | Ilmenau|Ilmenau Pörlitzer Höhe|Ilmenau-Roda|Elgersburg|Geraberg|Martinroda | 8010274 | 5 | Plaue (Thür) | Thüringen | Plaue | 99338.0 | 10.908698 | 50.778393 | 2024-07-12 04:57:00 | 2024-07-12 05:06:00 | 2024-07-12 04:57:00 | 2024-07-12 05:06:00 | 0.0 | 0.0 | NaN | on_time | on_time | 2 |